Language Independent NER using a Maximum Entropy Tagger

نویسندگان

  • James R. Curran
  • Stephen Clark
چکیده

Named Entity Recognition (NER) systems need to integrate a wide variety of information for optimal performance. This paper demonstrates that a maximum entropy tagger can effectively encode such information and identify named entities with very high accuracy. The tagger uses features which can be obtained for a variety of languages and works effectively not only for English, but also for other languages such as German and Dutch.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognizing named entities in spoken Chinese dialogues with a character-level maximum entropy tagger

Named Entity Recognition (NER) is an important task in information extraction, where major attention has been paid to written texts of a news or academic paper (esp. biomedical) style. In this paper we report the first piece of work on NER in spoken Chinese dialogues, as a preliminary step for spoken language understanding. The NER task is taken as a sequential classification problem and solved...

متن کامل

Voted NER System using Appropriate Unlabeled Data

This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic wo...

متن کامل

Language Independent Named Entity Recognition

The role of Internet in personal, economic and political advancement is growing in a fast pace. By the turn of century, data on web reaches to petabytes or exabytes or may even scale up-to unimaginable quantities. Extraction of precise and structured information from such large amounts of unstructured or semi-structured data is the major concern of web known as Information Extraction. Named ent...

متن کامل

Maximum Entropy Part-of-Speech Tagging in NLTK

In this paper we implement a part of speech tagger for NLTK using maximum entropy methods. Our tagger can be used as a drop-in replacement for any of the other NLTK taggers. We give a brief tutorial on how to use our tagger as well as describing the implementation at a high level. We evaluate our tagger on the Penn Tree Bank and compare our results to those of previous work.

متن کامل

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003